Past Event: PhD Dissertation Defense

Fast Algorithms for Minimally Supervised Learning

Youguang Chen, Ph.D Candidate

12 – 3PM
Monday Apr 13, 2026

POB 6.304

Abstract

Supervised learning has achieved remarkable success across scientific domains; however, in many practical settings labeled data are scarce, expensive, or infeasible to obtain. This dissertation develops fast, theoretically grounded algorithms for minimally supervised learning, with emphasis on statistical guarantees, spectral approximation, and scalable computation in high-dimensional regimes.

The first part studies density-based clustering. We establish a precise relationship between DBSCAN and minimum spanning trees and analyze a scalable approximation, kNN-DBSCAN, which replaces range queries with k-nearest-neighbor graphs. We characterize conditions under which the two formulations are equivalent and provide complexity analysis for a hybrid MPI/OpenMP implementation. The resulting method achieves provable scalability while preserving clustering structure in high dimensions.

The second part addresses representative sample selection via optimal experimental design. Building on regret minimization and online convex optimization, we derive near-optimal guarantees for A-, D-, E-, V-, and G-optimal objectives. Using matrix inequalities and spectral arguments, including applications of the Golden–Thompson inequality, we establish performance bounds for entropy-regularized and 4ω1/2-regularized relaxations. These results yield computationally tractable algorithms with approximation guarantees for large-scale design problems.

The third part develops Fisher Information Ratio Active Learning (FIRAL), a scalable algorithm for pool-based active learning in multinomial logistic regression. Under sub-Gaussian assumptions, we derive non-asymptotic upper and lower bounds relating the Fisher Information Ratio to excess risk, and prove near-optimal performance guarantees for the proposed relaxation and rounding procedures. To address computational bottlenecks, we introduce Approx-FIRAL, which exploits structured Hessian approximations and preconditioned conjugate gradient methods to reduce storage and computational complexity while preserving statistical guarantees.

The final part investigates Bayesian inverse problems with approximate forward operators. We propose novel independent Metropolis–Hastings schemes—Latent-IMH and Proximal-IMH—that correct operator approximation error through spectral alignment and Gauss–Newton-type transformations. We quantify the impact of operator discrepancies via Jacobian log-determinant diagnostics and expected Kullback–Leibler divergence, and establish conditions under which the resulting samplers retain accuracy while significantly reducing forward and inverse PDE solves.

Across clustering, optimal design, active learning, and Bayesian inference, this dissertation advances a unified principle: combining spectral approximation theory, regret analysis, and scalable numerical linear algebra enables minimally supervised learning algorithms with provable statistical guarantees and high computational efficiency.

Biography

Youguang Chen is a PhD candidate in the Computational Science, Engineering, and Mathematics (CSEM) program at The University of Texas at Austin. His research interests include active learning, optimal experimental design, and scalable Monte Carlo methods for large-scale Bayesian inverse problems. His research integrates optimization, non-asymptotic statistical analysis, and numerical linear algebra with high-performance computing to design algorithms that are both provably efficient and practically scalable.

Fast Algorithms for Minimally Supervised Learning

Event information

Date

12 – 3PM
Monday Apr 13, 2026

Location POB 6.304

Hosted by George Biros

Admin events@oden.utexas.edu